An Approach for Grammatical Constructs of Sanskrit Language using Morpheme and Parts- of-Speech Tagging by Sanskrit Corpus
نویسندگان
چکیده
Sanskrit since many thousands of years has been the oriental language of India. It is the base for most of the Indian Languages. Statistical processing of Natural Language is based on corpora (singular corpus). Collection of texts of the written and spoken words is known as Language corpus, which is collected in an organized way, in electronic media for the purpose of linguistic research. It presents as a resource to be systematically ‘consulted’ by language investigators. This paper explains an approach for tagging the corpora automatically at word and morphemic levels for Sanskrit. It also gives different tag sets used at both the levels. KeywordsPart-Of-Speech, tagging, noun, verb, parsing, lexical analysis.
منابع مشابه
SanskritTagger: A Stochastic Lexical and POS Tagger for Sanskrit
SanskritTagger is a stochastic tagger for unpreprocessed Sanskrit text. The tagger tokenises text with a Markov model and performs part-of-speech tagging with a Hidden Markov model. Parameters for these processes are estimated from a manually annotated corpus of currently about 1.500.000 words. The article sketches the tagging process, reports the results of tagging a few short passages of Sans...
متن کاملANN and Rule Based Model for English to Sanskrit Machine Translation
The development of Machine Translation system for ancient language such as Sanskrit language is much more fascinating and challenging task. Due to lack of linguistic community, there are no wide work accomplish in Sanskrit translation while it is mother language by virtue of its importance in cultural heritage of India. In this paper, we integrate a traditional rule based approach of machine tr...
متن کاملCoarse Semantic Classification of Rare Nouns Using Cross-Lingual Data and Recurrent Neural Networks
The paper presents a method for WordNet supersense tagging of Sanskrit, an ancient Indian language with a corpus grown over four millenia. The proposed method merges lexical information from Sanskrit texts with lexicographic definitions from Sanskrit-English dictionaries, and compares the performance of two machine learning methods for this task. Evaluation concentrates on Vedic, the oldest lay...
متن کاملبرچسبگذاری ادات سخن زبان فارسی با استفاده از مدل شبکۀ فازی
Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical ...
متن کاملMorpheme Based Parts of Speech Tagger for Kannada Language
Parts of speech tagging is the process of assigning appropriate parts of speech tags to the words in a given text. The critical or crucial information needed for tagging a word come from its internal structure rather from its neighboring words. The internal structure of a word comprises of its morphological features and grammatical information. This paper presents a morpheme based parts of spee...
متن کامل